36 research outputs found
Put three and three together: Triangle-driven community detection
Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its applications in many fields such as biology, social networks, or network traffic analysis. Although the existing metrics used to quantify the quality of a community work well in general, under some circumstances, they fail at correctly capturing such notion. The main reason is that these metrics consider the internal community edges as a set, but ignore how these actually connect the vertices of the community. We propose the Weighted Community Clustering (WCC), which is a new community metric that takes the triangle instead of the edge as the minimal structural motif indicating the presence of a strong relation in a graph. We theoretically analyse WCC in depth and formally prove, by means of a set of properties, that the maximization of WCC guarantees communities with cohesion and structure. In addition, we propose Scalable Community Detection (SCD), a community detection algorithm based on WCC, which is designed to be fast and scalable on SMP machines, showing experimentally that WCC correctly captures the concept of community in social networks using real datasets. Finally, using ground-truth data, we show that SCD provides better quality than the best disjoint community detection algorithms of the state of the art while performing faster.Peer ReviewedPostprint (author's final draft
Towards a property graph generator for benchmarking
The use of synthetic graph generators is a common practice among
graph-oriented benchmark designers, as it allows obtaining graphs with the
required scale and characteristics. However, finding a graph generator that
accurately fits the needs of a given benchmark is very difficult, thus
practitioners end up creating ad-hoc ones. Such a task is usually
time-consuming, and often leads to reinventing the wheel. In this paper, we
introduce the conceptual design of DataSynth, a framework for property graphs
generation with customizable schemas and characteristics. The goal of DataSynth
is to assist benchmark designers in generating graphs efficiently and at scale,
saving from implementing their own generators. Additionally, DataSynth
introduces novel features barely explored so far, such as modeling the
correlation between properties and the structure of the graph. This is achieved
by a novel property-to-node matching algorithm for which we present preliminary
promising results
An early look at the LDBC Social Network Benchmark's Business Intelligence workload
In this short paper, we provide an early look at the LDBC Social Network Benchmark's Business Intelligence (BI) workload which tests graph data management systems on a graph business analytics workload. Its queries involve complex aggregations and navigations (joins) that touch large data volumes, which is typical in BI workloads, yet they depend heavily on graph functionality such as connectivity tests and path finding. We outline the motivation for this new benchmark, which we derived from many interactions with the graph database industry and its users, and situate it in a scenario of social network analysis. The workload was designed by taking into account technical ``chokepoints'' identified by database system architects from academia and industry, which we also describe and map to the queries. We present reference implementations in openCypher, PGQL, SPARQL, and SQL, and preliminary results of SNB BI on a number of graph data management systems
The LDBC Graphalytics Benchmark
In this document, we describe LDBC Graphalytics, an industrial-grade
benchmark for graph analysis platforms. The main goal of Graphalytics is to
enable the fair and objective comparison of graph analysis platforms. Due to
the diversity of bottlenecks and performance issues such platforms need to
address, Graphalytics consists of a set of selected deterministic algorithms
for full-graph analysis, standard graph datasets, synthetic dataset generators,
and reference output for validation purposes. Its test harness produces deep
metrics that quantify multiple kinds of systems scalability, weak and strong,
and robustness, such as failures and performance variability. The benchmark
also balances comprehensiveness with runtime necessary to obtain the deep
metrics. The benchmark comes with open-source software for generating
performance data, for validating algorithm results, for monitoring and sharing
performance data, and for obtaining the final benchmark result as a standard
performance report
LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms
ABSTRACT In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms
The Linked Data Benchmark Council (LDBC): Driving competition and collaboration in the graph data management space
Graph data management is instrumental for several use cases such as
recommendation, root cause analysis, financial fraud detection, and enterprise
knowledge representation. Efficiently supporting these use cases yields a
number of unique requirements, including the need for a concise query language
and graph-aware query optimization techniques. The goal of the Linked Data
Benchmark Council (LDBC) is to design a set of standard benchmarks that capture
representative categories of graph data management problems, making the
performance of systems comparable and facilitating competition among vendors.
LDBC also conducts research on graph schemas and graph query languages. This
paper introduces the LDBC organization and its work over the last decade
The LDBC Social Network Benchmark
The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an
effort intended to test various functionalities of systems used for graph-like
data management. For this, LDBC SNB uses the recognizable scenario of operating
a social network, characterized by its graph-shaped data. LDBC SNB consists of
two workloads that focus on different functionalities: the Interactive workload
(interactive transactional queries) and the Business Intelligence workload
(analytical queries). This document contains the definition of the Interactive
Workload and the first draft of the Business Intelligence Workload. This
includes a detailed explanation of the data used in the LDBC SNB benchmark, a
detailed description for all queries, and instructions on how to generate the
data and run the benchmark with the provided software.Comment: For the repository containing the source code of this technical
report, see https://github.com/ldbc/ldbc_snb_doc